Thompson Sampling Hiring Process - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Thompson Sampling Hiring Process

Q: You mean try more prompts that are tied to what you think is good?

A: Yes.

I'll add more later because it's a bit of a mish-mash of explanations.

Rustic solution: reuse N previously used prompts that seem to give a good picture

This is naively antsy.

This is the same as the problem setup in reinforcement learning called the multi-armed bandit problem, "There are N slot machines, which one is optimal to play?

At this time, "selecting the one with the best past performance" is not good at all, because the slots with Pessimistic Misconceptions will not be selected and the misconceptions will not be corrected.

Because of "trade-offs between use and exploration," even those with poor grades should be explored in moderation.

There are a number of ways to accomplish this, of which Thompson sampling is one.

The reason is that in this use case, feedback on the good and bad of the generated images is not in real time.

A deterministic algorithm like UCB1 would keep trying the same thing until it gets feedback. Therefore, it is not suitable for this use case.

Thompson sampling is a stochastic algorithm by nature, so it's not a problem.

So, although it is a straightforward next step to explain Thompson sampling by treating the entire prompt as a slot, I actually skipped that step and moved on.

Instead of considering the prompt as a single slot in its entirety, consider it as a combination of parts after making a partial decision.

Specifically, in img2prompt, the BLIP part and the "painter," "painting style," "flavor," and other parts obtained from the CLIP search are made and then combined to create the prompt.

So, the code should be rewritten to output the parts before combining.

BLIP portion, author, style, and flavor are each selected and combined with Thompson sampling.

It assumes independence of features. This assumption is of course incorrect, but useful as an approximation.

The keywords generated by img2prompt are also used to generate prompts, so words I don't know are sometimes used in the prompts.

For example, "What does whimsical mean in this image prompt?" (a word I looked up in the dictionary because I didn't understand it yesterday).

Historically, the first step was to create and operate a "code that randomly selects good components and combines them to create a prompt" first.

At this time, these "good keywords" were statistically analyzed from "past like prompts" to create a list of frequently appearing keywords, and a human looked at them and chose the top ones.

Later, this "analyze and select n from the top and use them with probability 1/n" part was replaced by Thompson sampling.

On 9/9, the Discover the painter thing made me think, "Using img2prompt for generated images is also interesting because of unexpected discoveries.

Converted all generated images that were judged GOOD in 9/10 to be converted together.

On 9/12, the computer was made to generate prompts with random combinations.

Thompson sampling took shape on 9/18.

---

This page is auto-translated from /nishio/トンプソンサンプリング採用の流れ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.